Estimating speech from lip dynamics
نویسندگان
چکیده
The goal of this project is to develop a limited lip reading algorithm for a subset of the English language. We consider a scenario in which no audio information is available. The raw video is processed and the position of the lips in each frame is extracted. We then prepare the lip data for processing and classify the lips into visemes and phonemes. Hidden Markov Models are used to predict the words the speaker is saying based on the sequences of classified phonemes and visemes. The GRID audiovisual sentence corpus [10][11] database is used for our study.
منابع مشابه
لبخوانی و ادراک گفتار دانشآموزان کمشنوای مدارس ویژۀ کمشنوایان در شهر تهران
Objective: The goal of this study was to evaluate the lip reading ability and Speech perception of hearing impaired students of special schools for the hearing impaired in different speech levels. Materials & Methods: In this cross- sectional study, 44 deaf students (9-12 years old) were selected with multi-stage cluster sampling method, from two special schools for the deaf in Tehran. Tools...
متن کاملVisual analysis of lip coarticulation in VCV utterances
This paper presents an investigation of the visual variation on the bilabial plosive consonant /p/ in three coarticulation contexts. The aim is to provide detailed ensemble analysis to assist coarticulation modelling in visual speech synthesis. The underlying dynamics of labeled visual speech units, represented as lip shape, from symmetric VCV utterances, is investigated. Variation in lip dynam...
متن کاملImproving speaker identification performance in reverberant conditions using lip information
This paper considers the improvment of speaker identification performance in reverberant conditions using additional lip information. Automatic speaker identification (ASI) using speech characteristics alone can be highly successful, however problems occur with mis-matches between training and testing conditions. In particular, we find that ASI performance drops dramatically when given anechoic...
متن کاملVisual analysis of viseme dynamics
Face to face dialogue is the most natural mode of communication between humans. The combination of human visual perception of expression and perception in changes in intonation provides semantic information that communicates idea, feelings and concepts. The realistic modelling of speech movements, through automatic facial animation, and maintaining audio-visual coherence is still a challenge in...
متن کاملSpeech intelligibility after repair of cleft lip and palate
Background: Intelligibility refers to understandability of speech; and lack of it can negatively affect children’s overall communication effectiveness. Children with repaired cleft lip and/or cleft palate (CL/P) may experience poor speech intelligibility. This study aimed at evaluating speech intelligibility in children with repaired CL/P who had not been referred to sp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1708.01198 شماره
صفحات -
تاریخ انتشار 2017